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ABSTRACT 

Upcoming high spectral resolution telescopes, particularly Astro-H, are expected to 
finally deliver firm quantitative constraints on turbulence in the intra-cluster medium 
(ICM). We develop a new spectral analysis technique which exploits not just the line 
width but the entire line shape, and show how the excellent spectral resolution of 
Astro-H can overcome its relatively poor spatial resolution in making detailed infer- 
ences about the velocity field. The spectrum is decomposed into distinct components, 
which can be quantitatively analyzed using Gaussian mixture models. For instance, 
bulk flows and sloshing produce components with offset means, while partial volume- 
filling turbulence from AGN or galaxy stirring leads to components with different 
widths. The offset between components allows us to measure gas bulk motions and 
separate them from small-scale turbulence, while component fractions and widths con- 
strain the emission weighted volume and turbulent energy density in each component. 
We apply mixture modeling to a series of analytic toy models as well as numerical simu- 
lations of clusters with cold fronts and AGN feedback respectively. From Markov Chain 
Monte Carlo and Fisher matrix estimates which include line blending and continuum 
contamination, we show that the mixture parameters can be accurately constrained 
with Astro-H spectra: at a ~ 10% level when components differ significantly in width, 
and a ~ 1% level when they differ significantly in mean value. We also study error 
scalings and use information criteria to determine when a mixture model is preferred. 
Mixture modeling of spectra is a powerful technique which is potentially applicable to 
other astrophysical scenarios. 



1 INTRODUCTION 



Turbulence in galaxy clusters can arise from many sources, 
ranging from mergers and co s molog i cal structure for- 
mation JTau. Kravtsov fc Nagail I200E ; IVazza et all 120091 . 
l201ll ; IZuHone. Markevitch fe Johnsonl feoioh . to galactic 
wakes (iKirrj |2007l; IRuszkowski fc Ob] l201ll ), and AGN 
feedback (McNam ara fc Nulsenl l|2007h . and references 
therein). It is generally expected to be highly sub- 
sonic, with Mach numbers M ~ 0.1 — 0.5. Turbulence 
has wide-ranging and pivotal effec ts on ICM physics . 
It could dominate me tal transport (|Rebusco et al.l 120051 ; 
ISimionescu et al1l2008l ). accelerate pa rticles, as required in 
a prominent model of ra dio halos (|Brunetti et al.l l200ll ; 



iBrunetti fc Laz arian 2007 ), generate and amplify magnetic 
fields JSubramanian. Shukurov fe~ Haugenl 120061 ; iRvu et aLl 
120081 ; ICho et al.ll2009l ; iRuszkowski et alj|201ll ). and provide 



pressure support, thus impact ing X-ray mass measurements 
IlLau. Kravtsov fc Nagaill2009l ). and Suny aev-Zeldovich (SZ) 
measurements of the t hermal pressure (|Shaw et al.l l20ld : 
iBattaglia et al.ll2011al lbl; iParrish et al.ll2012h . The unknown 
level of non-thermal pressure support introduces system- 
atic deviations in the mass calibration of clusters and could 
strongly affect their use for cosmology. A particularly in- 



teresting effect of turbulence is its impact on the thermal 
state of the gas, potentially allowing it to stave off catas- 
trophi c cooling. It can do by dissipation of turbulent mo- 
tions (Ch urazov et a l. 2004; Kunz et al. 20l j), or turbulent 
diffusion of heat JCho et al.l 120031 ; iKim fc NaravarJ 120031 ; 
iDennis fc Chandranf 20051 ). More subtly, it can do so by af- 
fecting magnetic field topology; by randomizing the B-field, 
it can restore thermal conduction to ~ 1/3 of the Spitzer rate 
(IRuszkowski fc Ohll2010l .l 201ll ; |Parrish. Quataert fc Sharmal 
l201Ch . Besides turbulence, a variety of bulk motions such 
as streaming, shocks, and sloshing have been observecQ. 
Such (often laminar) gas motions are interesting in their 
own right. For instance, gas sloshing in the potential well 
of clusters, which produces observed cold fronts — contact 
discontinuities between gas of very different entropies — 
has gleaned information about hydrodynamic instabilities, 
magnetic fields, thermal condu ctivity and viscosity of ICM 
IjMarkevitch fc Vikhlininll2007l . and references therein). 



1 While rotation has not been directly seen, it is also expected 
from cosmological simulations jLau et alj|201lh . Its effects are 
generally too small to be detected by the methods discussed in 
this paper. 
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Current observational constraints on ICM turbulence 



are fairly weak, and mos tly indirect. They com 
analysis of pressure maps (|Schuecker et alj|2004f). 



the lack of 



detection of resona nt-line scattering ( Churazov et al. 20041; 
Werner et afll201Ch. Farad ay rotation maps (jVort fc Enfilinl 



20051 ; lEnfflin fc VogtJI 200(1 . and deviations from hydrostatic 



equil i brium with thermal pressure alone ijChurazov et al.l 
l2008l ; IChurazov et~al]|2010l ; IZhang et al.ll200Sl ). In general, 
these studies constrain cluster cores and either place up- 
per bounds on turbulence, or indicate (with large uncer- 
tainties) that it could be present with energy densities 
~ 5 — 30% that of thermal values. The energy density 
in turbulence is expected to increase str ongly with radius 
l|Shaw et alJl20ld :lB attaglia et alJlioilal ). though observa- 
tional evidence for this is indirect. The most direct means 
to constrain gas motions is through Doppler broadening of 
strong emission lines, but this remains undetected with cur- 
rent technology. By examining the wid ths of emission lines 
with XMM RGS, ISanders et all (|2010h found a 90% upper 
limit of 274 kms -1 (13% of the sound speed) on the turbu- 
lent velocity in the inner 30 kpc of Abell 1385; analysis of 
other systems provides m u ch we aker bounds ( <c500kms -1 ; 
ISanders. Fabian fc Smith] J201ll)'). Numerical simulations 
provide further in s ights (e.g. lLau. Kravtsov fc Nagai 12009] ; 
IVazza et all 120091 . l2010h . However, due to limited resolu- 
tion and frequent exclusion of important physical ingredi- 
ents such as AGN jets, magnetic fields, radiative cooling, 
anisotropic viscosity, it is difficult to draw robust conclu- 
sions. 

The forthcoming Astro-H missio r0 (launch date 2014) 
represents our best hope of gaining a robust understanding 
of gas motions in IGVo With unprecedented spectral reso- 
lution (FWHM ~ 4-5 eV), Astro-H could not only measure 
the widths of emission lines, therefore constraining the tur- 
bulent amplitude, but also probe the line shapes. Somewhat 
surprisingly, very few studies have been conducted to ex- 
tract velocity information from the shape of emission lines 
in a realistic observational context. Current work has fo- 
cused on studying a Gaussian approximation to the line 
ijRebusco et al.ll2008l ). and interpre ting the radial variatio n 
of the line w idth and line cen t er (jzhuravleva et al.ll2012T l. 
For instance, IZhuravleva et al.l (|2012T ) show how the radial 
variation of line width is related to the structure function of 
the velocity field, and also how the 3D velocity field can be 
recovered from the projected velocity field. However, such 
inferences generally require angular resolution comparable 
to characteristic scale lengths of the velocity field, and are 
likely feasible only for one or two very nearby clusters such 
as Perseus (though such studies do represent a very exciting 
possibility for ATHENA). At the same time, it has long been 
apparent that turbulence in clusters leads to significant non- 
Gaussianity in the line shape — i ndeed, these were clearly vis- 
ible in the early simulations o f ISunvaev. Norman fc Brvanl 
i|2003h . Ilnogamov fc Sunvaevl (|2003r i presented a deep and 
insightful discussion of the origin of line shapes, albeit in an 
idealized Kolmogorov cascade model for cluster turbulence 



2 http://astro-h.isas.jaxa.jp/ 

3 Much farther in the future, the ATHENA mission 
(http://sci.esa.int/ixo) could significantly advance the same 
goals. 



Table 1. Specifications of the Soft X-ray Spectroscopy System 
onboard the Astro-H telescope. 



Effective area cm 2 at 6 keV 225 

Energy range (keV) 0.3-12.0 

Angular resolution in half power diameter (arcmin) 1.3 

Field of view (arcmin 2 ) 3.05 X 3.05 

Energy resolution in FWHM (eV) 5 



(for instance, they do not consider the effect of gas slosh- 
ing and cold fronts). Heuristically, one can consider non- 
Gaussianity to arise when the size of the emitting region 
(heavily weighted toward the center in clusters) is not much 
larger than the characteristic outer scale of the velocity field. 
The central limit theorem does not hold as the number of in- 
dependent emitters is small (and/or in large scale bulk flows, 
the motion of different emitters is highly correlated). In a 
series of papers, Lazarian and his collaborators considered 
the relationship between the t urbulent spectrum and the 
spectral line shape in the ISM (lLazarian fc Pogosvan|[2000l . 
120061; IChepurnov fc LazariaifeOOrJ ). However, since they fo- 
cused on the supersonic and compressible turbulence seen 
in the ISM — a regime where thermal broadening is negligi- 
ble and density fluctuations are considerable — the methods 
they employ are not readily suitable for the mild subsonic 
turbulence expected in the ICM. 

We therefore aim to study how velocity information can 
be recovered from the emission line profile in the ICM con- 
text, in a realistic observational setting. In particular, we 
advance the notion that the profile can be separated into dif- 
ferent modes, which have a meaningful physical interpreta- 
tion. As we will discuss in more detail below, many processes 
in the ICM could give rise to velocity fields composed of dis- 
tinct components. For instance, the sharp contact disconti- 
nuity in velocity in cold fronts will give rise to a bimodal ve- 
locity field where one component is significantly offset from 
another. Another interesting scenario arises if turbulence is 
not volume-filling (due, for instance, to anistropic stirring by 
AGN jets). Then spectral lines of different width (with and 
without turbulent broadening) will be superimposed on one 
another. When seen in the same field of view (FOV), these 
components correspond to different modes in the line pro- 
file, and decomposing the velocity field into dominant modes 
can yield valuable quantitative information (for instance, the 
volume filling factor). For the upcoming Astro-H mission, 
mode separation in the spectrum is necessary and impor- 
tant since the poor angular resolution make it hard to spa- 
tially resolve different components — indeed, the high spec- 
tral resolution of Astro-H is our best tool for inferring the 
complex structure of the velocity field. We use standard mix- 
ture modeling techniques and Fisher matrix/Markov chain 
Monte Carlo error analysis to quantify how well we could 
separate and constrain different components from a single 
spectrum, and then establish what we can learn from about 
the underlying velocity field from such a component separa- 
tion. 

Before proceeding to the main discussion, we first list a 
few specifications of the Astro-H mission, on which our dis- 
cussions are based. Our study mainly takes the advantage of 



© 0000 RAS, MNRAS 000, 000-000 



Probing Gas Motions in the Intra- Cluster Medium: A Mixture Model Approach 3 



Table 2. Photon counts N p in the He-like iron line and physical 
length corresponding to angular resolution in HPD (1.3 arcmin) 
for a few nearby galaxy clusters. The photons are accumulated 
from 1 FOV through the cluster center over 10 6 seconds. 



Cluster Name 


Redshift 


<2i.3 (kpc) 


Af p (xl0 4 phot) 


PERSEUS 


0.0183 


28.81 


5.8 


PKS0745 


0.1028 


146.76 


3.8 


A0478 


0.0900 


130.36 


3.7 


A2029 


0.0767 


112.79 


3.1 


A0085 


0.0556 


83.78 


3.1 


A1795 


0.0616 


92.17 


2.6 


A0496 


0.0328 


50.76 


1.9 


A3571 


0.0397 


60.94 


1.7 


A3112 


0.0750 


110.51 


1.7 


A2142 


0.0899 


130.23 


1.6 


2A0335 


0.0349 


53.87 


1.3 


HYDRA-A 


0.0538 


81.23 


1.3 


A1651 


0.0860 


125.13 


1.1 


A3526 


0.0103 


16.37 


0.8 



the high spectral resolution of the Soft X-ray Spectroscopy 
System (SXS) onboard the Astro-H telescope. Its proper- 
ties, taken from the "Astro-H Quick Reference'^, are given 
in Table [T] The energy resolution is 5 eV in FWHMB corre- 
sponding to a standard deviation of 2.12 eV. For comparison, 
the thermal broadening of the Fe 6.7 keV line is 2.07 eV for 
a 5 keV cluster, while broadening by isotropic Mach number 
A4 ~ 0.2 motions is 2.9 eV. Thus, for the highly subsonic 
motions in the core with Mach numbers M ~ 0.1 — 0.3 
generally seen in cosmological simulations, the instrumen- 
tal, thermal and turbulent contributions to line broadening 
are all roughly comparable. In contrast to the impressive 
energy resolution, the angular resolution of Astro-H is poor: 
1.3 arcmin in half power diameter (HPD). Therefore, differ- 
ent velocity components are likely to show up in the same 
spectrum. Based on these specifications, Table [2] shows the 
expected photon counts in the He-like iron line at 6.7 keV 
for a few of the brightest nearby clusters (z ^ 0.1). The 
photons are accumulated in one FOV through the cluster 
center over 10 6 seconds; the ~ several x 10 4 photons col- 
lected should allow good statistical separation of mixtures 
if present. The densi ty distribut i ons an d cluster tempera- 
tures are taken from Ichen et all (120071 ). and metallicity is 
assumed to be 0.3 Zq. Also shown are the physical lengths 
corresponding to the angular resolution in HPD, which are 
~100 kpc; comparable to the core size. We therefore do not 
expect Astro-H to spatially resolve many structures. 

The remainder of the paper is organized as follows. In 
§ [2 we discuss possible scenarios that could give rise to 
multiple component spectra, further motivating the current 
study. In § [3] we develop the methodology to be used in 
this paper. In § [3] we discuss how accurately different com- 
ponents could be recovered in idealized toy models, to build 
our understanding of the applicability and capabilities of the 
method. In § [S] we apply our statistical method to realistic 

4 http:/ /astro-h. isas.jaxa.jp/doc/ahqr.pdf 

5 It has shown t o be even lower— 4 eV— in laboratory tests 
llPorter et alj|2010h . 




Figure 1. Density map on a slice through the cluster center. The 
dashed line shows the direction along which the profiles in Fig. 
[3] are computed, while the perpendicular dotted line-chosen to 
maximize line of sight velocity shear -indicates the observation 
direction for the solid red velocity PDF shown in Fig. [4] The 
dotted red line shows an alternate viewing direction with much 
less velocity shear; its velocity PDF is given by the thin red line 
in Fig. [3] 



simulations of galaxy clusters, where we have full knowledge 
of the underlying velocity field, and see what information we 
can recover. In § [6] we conclude by summarizing the main 
results. 



2 MOTIVATION 

In this section, we motivate the current study by giving ex- 
amples of very common processes operating in the ICM 
which could give rise to multi-component velocity fields: 
bulk motions from mergers and sloshing, and AGN feed- 
back. 



2.1 Bulk Motions 

Thus far, most constraints on gas bulk motions comes from 
observations of sharp density gradients in the plane of 
the sky. Classic bow shocks have been seen in a handful 
of violent mergers. Much more common are "cold fronts" 
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Figure 2. Velocity fields on the same slide as in Fig. [T] over- 
laid with density (solid blue curves) and temperature (dashed red 
curves) contours. The large purple arrow indicates the location of 
the cold front. 
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Figure 3. Density (dashed curve), temperature (dotted curve) 
and linc-of-sight velocity (solid curve in the bottom panel) profiles 
along the a direction perpendicular to the cold front, as indicated 
in Fig. [T] with a dashed line. Here, the position of the cold front 
is given by the vertical (cyan) line at 85 kpc. 
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Figure 4. The thick solid (red) curve is the normalized emission- 
weighted velocity PDF from a box centered on the dotted line 
in Fig. [T] The box is 100 kpc long, 100 kpc wide and 1 Mpc 
deep. The thick dashed green curve, is the corresponding profile 
of the He-like iron line at 6.7 keV (see top axis for energy scale), 
including the effects of thermal broadening, while the thick dot- 
dashed purple curve also includes instrumental broadening. The 
thin lines show the same curves for the line of sight given by the 
dotted red line in Fig. IT1 



l|Markevitch &: Vikhlininl l2007h : sharp contact discontinu- 
ities between gas phases of different entropies, discovered in 
the last decade thanks to the high-resolution of the Chandra 
X-ray telescope. They are seen both in mergers (where the 
cold gas arises from the surviving cores of infalling subclus- 
ters) and relaxed cool core clusters (where they are produced 
by the displacement and subsequent sloshing of the low- 
entropy central gas in the gravitational potential well of the 
cluster). They are remarkably ubiquitous, even in relaxed 
cool core clusters with no signs of recent mergers, which of- 
ten exhibit several such cold fronts at different radii from 
the density peak. For instance, they are seen in more than 
half of all cool core clusters; given projection effects, most 
if not all cool core clusters should exhibit such features. Ev- 
idently, coherent gas bulk motions are extremely common 
if not universal, and their effects must be taken into ac- 
count when interpreting Astro-H spectra. Generically, we 
would expect bulk motions to offset the centroids of emit- 
ting regions with significant line-of-sight relative velocity. 
Cold fronts have been used to probe the amplitude and di- 
rection of gas motions in the plane of the sky; combining 
this with line-of-sight information from the spectrum could 
prove very powerful indeed. 

Our example is taken from an adiabatic numerical sim- 
ulation from cosmolog ical initial c o nditions with the adap- 
tive mesh code Enzo (|Brvanlll999l ; iNorman fc Brvanlll999l ; 
lO'Sheaet aill2004h . We assume a ACDM cosmology with 
cosmological pa rameters consistent w ith the seventh year 
WMAP results <|Komatsu et alj|201ll ): Q m = 0.274, fi A = 
0.726, Q b = 0.045, h = 0.705, a 8 = 0.810, n s = 0.96. The 
simulation has a box size of 64 Mpc, and a root grid of 128 3 . 



e Indeed, we show here the very first cluster we simulated from 
random initial conditions, which already exhibited cold front like 
features. 
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We picked the most massive cluster (M ~ 2 x 10 14 M ) 
from the fixed-grid initial run, and re-simulate it with much 
higher resolution. The highest spatial resolution is 11 kpc in 
the cluster center. 

The cluster has a disturbed morphology, and shows a 
"cold front" -like feature in the core. Note that our adiabatic 
simulation necessarily produces a NCC cluster. The density 
and velocity fields on a slice through the cluster center are 
shown in Fig. Q] and [21 respectively. In the position indicated 
by the large arrow in Fig. [2] the density, temperature and 
velocity all change rapidly. This is clearly shown in Fig. O 
which shows density, temperature and velocity profiles along 
a line perpendicular to the front (indicated in Fig. [1] with a 
dashed line). At ~ 85 kpc from the cluster center, the den- 
sity decreases while the temperature increases rapidly, as 
expected in a cold front (for a shock, the temperature jump 
would be opposite). Furthermore, the pressure is continu- 
ous across the front, while the tangential velocity changes 
direction discontinuous ly across the front — both well- known 
features of cold fronts IjMarkevitch fc Vikhli nin 2007). 

For an observation direction along the white dotted line 
in Fig.[TJ Fig. |4]shows the emission- weighted probability dis- 
tribution function (PDF) of the line-of-sight velocity. Moti- 
vated by Table [21 we extract the emission-weighted PDF 
from an volume with an area of 100 x 100 kpc 2 and a depth 
of 1 Mpc (this last number represents the line of sight depth, 
and is chosen for convenience. Our results are insensitive to 
it as long as it is much larger than the core size, where 
most of the photons come from). The PDF clearly shows 
two peaks, centered at -400 kms -1 and 250 kms , cor- 
responding to the gas on different side of the cold front. 
After convolution with thermal broadening, the dashed line 
shows the profile of the He-like iron line at 6.7 keV, while the 
dot-dashed line also includes the instrumental broadening of 
Astro-H. They also clearly show double peak features. 

The above case is a somewhat idealized "best case" sce- 
nario, where we have assumed the viewing angle to be along 
the direction of maximum line of sight velocity shear, thus 
maximizing the separation between the two peaks in the 
velocity PDF. For a more general viewing angle, the separa- 
tion would not be so clear, as we show with the thin curves 
in Fig. [J] This is the PDF along the red dotted curve in Q3 
which has very small line-of-sight bulk flow. There is only 
one large peak, but with a long tail. From Fig. [21 we see this 
long tail comes from the gas surrounding the cold clump, 
which has shear velocities with large components along the 
LOS. Therefore the PDF can also be separated into two com- 
ponents - a narrow component emitted by the cold clump 
and a broad component from the ambient gas. The offset 
between the components is a measure of the LOS contact 
discontinuity in the bulk velocity, while smaller scale shear 
contributes to the width. Such a decomposition of the line- 
of-sight velocity, combined with spatially resolved temper- 
ature and density information in the plane of the sky from 
X-ray imaging, could shed more light on the 3D velocity field 
as well as physical information such as the gas viscosity. 

2.2 Volume-filling Factor of Turbulence 

The previous section highlighted a situation where strong 
shear or bulk motion gives rise to different components with 
offset centroids ("separation driven" case). Another regime 



where different components could arise is when the two 
components have markedly different widths ( "width driven" 
case). We saw an example of this at the end of the previ- 
ous section: a narrow component due to a cold, kinemat- 
ically quiescent clump, and a broader component due to 
the sheared surrounding ambient gas. More generally, dif- 
ferent widths arise when turbulence varies spatially. The 
case when turbulence is only partially volume-filling is a 
particularly interesting special case. Many of the physical 
effects of turbulence depend not only on its energy den- 
sity, but its volume filling fraction /v, which is often im- 
plicitly assumed to be unity. For instance, for turbulence 
to stave off catastrophic cooling, it must be volume-filling. 
Thi s is by no means assured. For instance, analytic mod- 
els l|Subramanian. Shukurov fc Haugenll2006h of turbulence 
generation during minor mergers predict /v ~ 0.2 — 0.3 
to be small, but area-filling (i.e., the projection of turbu- 
lent wakes on the sky cover a large fraction of the cluster 
area, fs ~ 0(1)). Interestingly, cosmological AMR simu- 
lations which use vorticity as a diagnostic for turbulence 
find good agreement; /y <:0 .3 and fs ~ 0(1) for all runs 
ijlapichino fc Niemeverl2008fh In our own sim ulations of stir- 
ring by galaxies ( Ruszkowski fc o"hl l201l[ ) , we have seen 
both high and low values of fy, depending on modeling 
assumptions. If g modes are excited by orbiting galaxies 
(which requires the driving orbital frequency lo to be less 
than the Brunt- Vaisala frequency u>BV-a requirement which 
depends on both the gravitational potential and tempera- 
ture/entropy profile of the gas), then volume-filling turbu- 
lence is excited; otherwise turbulence excited by dynami- 
cal friction is pote ntially confined t o thin "streaks" be hind 
galaxies (see also iBalbus fc Sokeil l|l990l ): iKiml ||2007h ). If 
turbulence is patchy, we might expect spectral lines to have 
a narrow thermal Doppler core (produced in quiescent re- 
gions), with turbulently broadened tails. In the context of 
our mixture model, measuring the fraction of photons in the 
second component might allow a quantitive measure of /v. 

Yet another context in which strongly spatially vary- 
ing or partial volume-filling turbulence could result in mul- 
tiple components in the velocity PDF is AG N feedback, 
which is ubiquitous in most cool core clusters dBirzan et al. 
l|2004l) ; for a recent review, see iMcNamara fc Nulsen 
(2012)). AGN jets are launched over a narrow solid an- 
gle and are fundamentally anisotropic; thus, their abil- 
ity to sustain isotropic heating in the core has of- 
ten been questioned. Isotropization of the injected en- 
ergy could arise from weak shocks and sound waves 
(Fabi arTet al.ll2003T ). frequent re -orientation of jets by ran- 
domly orien ted accretion disks (iKing fc Pringld [20071). jet 
preccssic m (iDunn. Fabian fc Sandersl 120061 ; Gas pari et al.l 
120111). and cavities being blow n abou t by cluster weather 
ilBriiggen. Hoeft fc Ruszkowski 120051 ; iHeinz et al.l 120061 ; 
iMorsonv et al. 201Ch . As above. AGN could also excite g- 
mode oscillations; an intriguing example is a cross-like struc- 
ture on 100 kpc scales in the ICM surrounding 3C 401 
|Revnolds. Brenneman fc Stockel I2005T I . A measurement of 
/v could thus constrain the efficacy of such mechanisms in 
isotropizing AGN energy deposition throughout the core. 
The expansion of AGN-driven cavities can also introduce 
high bulk velocities and shear (corresponding more to the 
"separation-drive" regime); this is potentially directly mea- 
surable with ATHENA's excellent angular and spectral reso- 
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lution |Heinz. Briiggen fc Morsony ; 2010), but would require 
indirect methods such as mixture modeling with the poor 
angular resolution of Astro-H. In SjSl we analyze an AGN 
feedback simulation kindly provided to us by M. Briiggen. 



2.3 Physical Significance of Mixture Model 
Parameters 

In Sj3] we lay out our methodology for recovering mixture 
model parameters, and in subsequent sections we describe 
how accurately these can be constrained. These parameters 
are the mixture weights fi, and means and variances Hi, erf , 
of the fitted Gaussians. Given the results of this section, we 
can tentatively ascribe physical significance to these parame- 
ters. The mixture weights fi represent the emission-weighted 
fraction of the volume in each distinguishable velocity com- 
ponent. The Gaussian means pn represent the bulk velocity 
of a given component. In particular, the difference between 
the means is a measure of the LOS shear between these com- 
ponents (e.g., as arises at a cold front). Note that this shear 
due to bulk motions can be considerably larger than the cen- 
troid shift due to variance in the mean, induced by turbulent 
motion with a finite coherence length. The latter is given by 
Hi ~ o%l y/N, where N ~ L C mit jlv is the average number of 
eddies pierced by the line of sight, and Lemiti are the size 
of the emitting region and the coherence length of the veloc- 
ity fi eld respectively (jRebusco et al.1120081 ; IZhuravleva et al.l 
2012). The variances Oi represents turbulent broadening or 
shear due to the small scale velocity field. 



3 METHODOLOGY 

We have argued that the X-ray spectrum from galaxy clus- 
ters should have multiple distinct components. Uncover- 
ing these components is the domain of mixture modeling, 
a mature field of statistics with a large body of literature. 
We will specialize to the case of Gaussian mixture mod- 
eling, when Gaussians are used as the set of basis func- 
tions for the different components. This is an obvious choice, 
since thermal and instrumental broadening are both Gaus- 
sian, and turbulent broadening can be well approximated 
with a Gaussian when the injection scale is much smaller 
than t he size of the emitting regions |lnogamov fc Sunvaevl 
(2003); i.e., once coherent bulk motions have been separated 
out by classification into different mixtures, the remaining 
small scale velocity field is well approximated by a Gaus- 
sian). It is also by far the best studied case. Mixture mod- 
eling has been applied to many problems in astrophysics, 
such as detecting bimodality in globular cluster metallici - 
ties (|Ashman. Bird fc ZepJll994l ; lMuratov fc Gnedir]|2010t ) 
linear regression (lKelly|2007l). backg round-source separation 
jGuglielmetti. Fischer fc Dosell 2009), and detec ting variabil- 
ity in time-series (|Shin. Sekora fc Bvunl [2009), though to 
our knowledge it has not been applied to analyzing spec- 
tra. It should be noted that the specialization to Gaussian 
mixture is not necessarily restrictive; for instance, Gaussian 
mixtu res have been used to model qu asar luminosity func- 
tions (Kel ly. Fan fc Vestergaard1l2008l ). For us, the fact that 
Gaussians are a natural basis function allows us to model 
the spectra compactly with a small number of mixtures, and 
assign physical meaning to these different components. 



Consider a model in which the observations asi, . . . ,x n 
are distributed as a sum of k Gaussian mixtures: 

k 



f(x\6) = ^2u>ifj(x\nj,cr?) 



(1) 



2=1 



where fj (x\m , erf) are normal densities with unknown means 
fij and variances <t|, and ui, are the mixture weights. The 
parameters which must be estimated for each mixture are 
therefore 8j = (wj, <7?), and the function / can be viewed 
as the probability of drawing a data point with value x given 
the model parameters 9. Parameter estimation in this case 
suffers from the well-known missing data problem, in the 
sense that the information on which distribution j a data 
point Xi belongs to has been lost. In addition, the number 
of mixtures k may not be a priori knowrQ. Standard tech- 
niques for overcoming this are a variant of maximum likeli- 
hood techniques known as Expec tation Maximization (EM; 
iDempster. Laird fc RubirJ (1 19771 )). or Maximum a Posterior 
estimation (MAP; see references in Appendix), which gener- 
ally involves Markov Chain Monte Carlo (MCMC) sampling 
from the posterior. They are both two-step iterative proce- 
dures in which parameter estimation and data point mem- 
bership are considered separately. Since they do not require 
binning of the data, all information is preserved. We have ex- 
perimented extensively with both. However, due to the large 
number of data points (~ 10 4 photons) in this application, 
we have found that the much simpler and faster procedure 
of fitting to the binned data yields virtually identical results. 
In the Appendix, we describe our implementation of Gibbs 
sampling MAP and how it compares with the much simpler 
method we use in this paper. 

Here, we simply bin the data a nd adopt as our log- 
likelihood the C-statistic (|Cashll 19791 ): 



21n£(p|d) = -2 ^ njne. 



e» 



lnnj 



(2) 



where C{p\d) is the likelihood of the parameters p given the 
data d, Nbin is the number of bins, ni and d are the observed 
and expected number of counts in the i-th bin; m , d are 
obviously functions of the data d and the unknown model 
parameters p respectively. It assumes that the number of 
data points in each bins is Poisson distributed (indeed, it 
is simply the log of the Poisson likelihood). As we describe 
in the Appendix, maximizing this statistic produces iden- 
tical results to more rigorous mixture modeling techniques 
for large number of data points, when the bin size is suffi- 
ciently small. Naively, for a large number of data points one 
might expect \ 2 minimization to work equally well. How- 
ever, in fitting distributions we are sensitive to the wings of 
the Gaussian basis functions, when the expected number of 

7 In this case, the optimal number of mixtures can also be esti- 
mated from the data, via simple criteria such as the Bayesian In- 
formation Criterion (see equation [TJl , or more sophisticated tech- 
niques in so-called Infinite Gaussian Mixture Models. In this pa- 
per, we only investigate separating the two most dominant com- 
ponents of the spectrum, which have the highest signal-to-noise. 
The data is generally not of sufficient quality to allow solving for 
more than two mixtures (strong parameter degeneracies develop). 
Physical interpretation is also most straightforward for the two 
dominant mixtures. 
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counts in a bin is small and the data is therefore Poisson 
rather than Gaussian distributed. 

With the likelihood specified in Equation [21 we sam- 
ple from the posterior usi ng Metropolis-Hastin gs MCMC, 
adapted from CosmoMC (|Lewis fc Bridle! r2002). Each run 
draws ~ 10 5 samples. The first 30% are regarded as burn- 
in and are ignored in the post-analyses. For all the runs, 
we visually exam the trace plots to check for convergence. 
The MCMC analysis yields the best-fit MAP parameters as 
well as the full posterior distribution of parameters, which 
allows us to estimate confidence intervals. In all cases, we 
use non-informative (uniform) priors; the range of possi- 
bilities for the turbulent velocity field is sufficiently large 
that only very weak priors are justifiable. The only obvi- 
ous prior we use is < fi < 1. Note that there are two 
identical modes in the likelihood, since it is invariant under 
permutation of the mixture indices-the well-known identifia- 
bility or "label-switching" problem. Generally, in a k compo- 
nent mixture, there are k\ identical modes in the likelihood. 
During the course of a Monte-Carlo simulation, instead of 
singling out a single mode of the posterior, the simulation 
may visit portions of multiple modes, resulting in a sam- 
ple mean which in fact lies in a very low probability region, 
as well as an unrealistic probability distribution. We en- 
force identifiability in a very simple manner by demanding 
/ii < /i2 and hence s = (12 > 0- While this is known 

to sometimes be problematic JCeleux. Hurn fc Robert|[200ol : 
J asra. Holmes fc Stephens! [2005^ in practice it suffices for 
our simple models. 

For a large number of data points, the distribu- 
tion of model parameters becomes asymptotically Gaus- 
sian, in which case the Fisher matrix can be used 
to quickly estimate joint p arame ter uncertainties (e.g., 
iTegmark. Taylor fc Heavens! j 19971 ) 1. As a consistency 
check, we therefore also calculate the Fisher matrix when- 
ever the input model is simple enough to be expressed ana- 
lytically. It is defined as: 

4 dp t dpj / ' ^ 

where pi is the i-th model parameter. The best attainable 
covariance matrix is simply the inverse of the Fisher matrix, 



(4) 



and the marginalized error on an individual parameter pi is 
(F _1 )ii. Differences between the MCMC and the Fisher 
matrix error bars generally indicate the non-Gaussianity 
of the likelihood surface (or equivalently, that the log- 
likelihood cannot be truncated at second order in a Taylor 
expansion) . 



4 IDEALIZED MODELS 

4.1 Two component Gaussian mixture models: 
General Results 

Before focusing on the specific application to galaxy clus- 
ters, we first consider a more general problem: how well two 
Gaussian profiles can be separated. As mentioned previously, 
Astro-H data quality is generally only sufficient to allow 
solving for the two most dominant mixtures. A two mix- 
ture component is likely the most common scenario, with 




0.1 



s/(a 1 +a 2 ) 



Figure 5. Constraints on the five model parameters with TV^ = 
10 4 data points, as a function of s/(o\ + CT2), where s = [12 — Ml 
is the separation between the means. The curves and points are 
the results obtained using the Fisher matrix and MCMC meth- 
ods, respectively. The dashed lines and circles [green] , dotted lines 
and upward triangles [blue] , solid lines [red] , dot-dashed lines and 
downward triangles [purple] and dot-dot-dashed lines and dia- 
monds [brown] correspond to 02 = (0.6, 0.8, 1, 1.2, 1.4)<ri, respec- 
tively. 
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Figure 6. Error contours for a "SD" case (a\ = 1, gq. = 0.8, 
s = 2(<ri + CT2)): contours depict the 68%, 95% confidence levels 
for the marginalized distribution; the shadings shows the mean 
likelihood of the samples; the solid and dashed curves in the 1- 
D plots are the fully marginalized posterior and relative mean 
likelihood of the samples, respectively. 
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Figure 7. Same as Fig.[6]but for a "WD" case: a\ = 1, a% = 0.1 
s = 0.25(oi + (T2). Note the increased parameter degeneracies. 



the most straightforward physical interpretation. These re- 
sults serve to guide and motivate our later discussions. 
Consider therefore the profile: 



(5) 
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where fi is the fraction of each component, while \ii and 
Oi are the mean and standard deviation of i-th Gaussian 



Figure 8. Constraints on the model parameters for a 2 Gaus- 
sian mixture as a function of the fractional difference in width, 
when s = 0.2 and a\ = 1. As in Fig.fS] the lines and points show 
the results obtained with the Fisher matrix and MCMC tech- 
nique, respectively. The solid lines and squares [red], dashed lines 
and circles [green] , dotted lines and upward triangles [blue] , dot- 
dashed lines and downward triangles [purple] and dot-dot-dashed 
lines and diamonds [brown] are the constraints on /1, fii, s, a\ 
and <T2, respectively 



function. Given the constraint fi = lj there are only 
five model parameters, which we choose to be: /i, fj,i, 
s(= fj,2 — fix), <Ti and 02. Note, 1x2 has been replaced by 
s (the separation between the two Gaussians) since the lat- 
ter, as we will see more clearly later, usually carries clearer 
physical meaning. 

The constraints, expressed in term of standard devi- 
ations A throughout this paper, are forecasted with both 
the MCMC and Fisher matrix methods (for the MCMC 
runs, they correspond to 68%, 95% confidence intervals for 
A, 2 A respectively, even if the parameter distribution is non- 
Gaussian). For each model we create a Monte-Carlo realiza- 
tion with Nd data points and forecast constraints for this 
data set. Motivated by Table [5] we assume Nd = 10 4 . In 
general, the standard deviation of the model parameters 
Ap oc 1/ yNd, though there are some subtleties — see further 
discussions below. The constraints also depend on how much 
the two components differ; if they are difficult to distinguish, 
mixture modeling will fail. For Gaussian components, they 
may differ in fraction fi, mean fii or width Here, we shall 
mostly focus on a situation when the mixing fractions are 
comparable: /1 = 0.4, /2 = 0.6, and focus on how mixture 
separation can be driven by differences in mean ("separa- 
tion dominated", or SD), or width ("width drive", or WD). 
In practice, we care mostly about the case when the mix- 
ing fractions are roughly comparable, since then the different 
components are of comparable importance in reconstructing 
the (emission-weighted) velocity field. As a practical mat- 
ter, it also becomes increasingly difficult to perform mixture 
modeling when one component dominates (though sec ^4.41) . 

The results are shown in Fig. [5] - [9] Fig. [5] shows the 
constraints as a function of the separation s = [12 — fii, nor- 
malized by the sum of the standard deviations: s/(<7i +(72). 
Different line types and point types indicate different values 
of 02 = (0.6, 0.8, 1, 1.2, 1.4) o"i. Note that all five parameters 
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Figure 9. Constraints on the five free parameters for different 
values of N d and f\ . The solid curves and squares are the results 
for the fiducial case: N d = 10 4 , <Ti = 1, crj = 1 and f\ = 0.4; 
results for N d = 10 5 , fi = 0.3, and /i = 0.5 arc shown with 
dashed curves and circles, dotted curves and downward triangles, 
and dot-dashed curves and upward triangles, respectively. 



scale with s/(o"i + 02) in the same way, with fractional er- 
rors which are all roughly comparable. We can identify three 
distinct regimes: 

• Separation Driven For s/(a\ + (72) ;>2, the frac- 
tional errors converge to an asymptotic constant value, in- 
dependent of s/(<ti + 02). In this regime, the separation 
is so large that different components could be viewed as 
individually constrained without mixing from other com- 
ponents. Except for A(s) (which depends on A(/i2) ~ 
(72 / y/ fiNd), this asymptotic convergence is also independent 
of 1T2 (i.e., the relative widths of the distributions don't mat- 
ter when the separation is large). The asymptotic values for 
A UtA, A(crQ/a» and A(/i) are Gi/y/J~N2, l/y/2fiN d and 
y/ fi(l — fi)/N d , respectively; given our N d = 10 4 , this cor- 
responds to ~ 1% accuracy in parameter constraints. 

• Hybrid For 0.3 <Cs/(o"i + (72) <C2, the separation is 
comparable to the sum of widths. The mixing between dif- 
ferent components become severe and the quality of param- 
eter constraints decrease rapidly with decreasing s. Since 
constraints are increasing driven by data points in the tails 
of the respective mixtures (which drive distinguishability), 
the effective number of data points N c a < N d falls. Strong 
parameter degeneracies also develop. 

• Width Driven When s/(oi +(72) <J0.3, the separation 
between the distribution becomes negligible, and component 
separation is driven almost entirely by differences in width 
(note how parameter uncertainties blow up at low s when 
(7i = cr2). It is driven to an asymptotic value determined 
by the effective number of data points in the tails of the 
mixtures, N e g. 

The results obtained with the Fisher matrix (lines) and 
MCMC (points) agree well with each other when the mix- 
tures are easily distinguishable (when s/(<7i + (72) is large 
or (72/(71 is reasonably far away from 1). Otherwise, discrep- 
ancies between these two methods are clear. These discrep- 
ancies are caused by the non-Gaussianity of the likelihood 
surfaces and the priors we placed in the MCMC runs. In this 
regime, one therefore cannot use the Fisher matrix approx- 
imation to the full error distribution. 

Fig. [6] and [7] show the marginalized likelihood distribu- 
tions and error contours for two example runs. Fig. [S]is in 
the SD regime (s = 2(oi +(72))- The likelihood distributions 
are very close to Gaussian, explaining the consistency be- 
tween the Fisher matrix and MCMC results. The contours 
allow direct reading of correlations among parameters. The 
strongest correlation is between /ii and s. As expected, they 
are negatively correlated, since s = fi2 — Mi while the /ij's are 
uncorrelated. Fig.[6]is in the WD regime (s = 0.25((7i +(72)). 
The likelihood distributions now deviate from Gaussians, 
and the correlations among parameters are much stronger. 
These are all consistent with the facts that the constraints 
are worse (due to larger parameter degeneracies) and the 
Fisher matrix results are no long in agreement with MCMC 
results (due to non-Gaussianity of the likelihood surface). 

In Fig. [8] we show how the constraints vary with dif- 
ferences in the Gaussian width in the "width dominated" 
regime. The width of the first component is fixed to a\ = 1, 
while the separation s = 0.2 (note that sj(o\ + (72) ~ 0.2 is 
typically the minimal value expected in cluster turbulence 
when there are no bulk flows, and is due solely to error in the 
mean; see i]2.3|l . In general, the constraints improve as the 
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differences in width increase, consistent with intuitive ex- 
pectations. However, the constraints on s and 02 turn over 
around 02 ~ 2<ti , beyond which they increase with <rjf|. This 
can be understood as follows: the error on these quantities 
receive contributions from confusion error (which dominates 
at low 02) and scaling with 02 (since A(fii) ~ cn/^fiNd and 
Affi ~ ail \pljiN&\ this dominates at high 02)- On the other 
hand, the error on the mixing fraction /1 scales strongly with 
the difference in widths, since it is driven solely by confusion 
error. However, for other parameters the scaling is signifi- 
cantly weaker. For most cluster scenarios, the width-driven 
regime gives relative errors of Ap/p ~ 10%, which is still 
small. 

Fig. [9] shows how the constraints vary with Nd and /1 . 
The fiducial case (solid curves and squares) is computed as- 
suming Nd — 10 4 , <ti = 1, (72 = 0.8 and /1 = 0.4, exactly 
the same as the dotted curves and upward triangles in Fig. 
[S] As we increase the Nd by a factor of 10, most constraints 
are improved by a factor of vTS, consistent with our expec- 
tation that Apt/pi oc 1/^/Nd. This is despite the fact that 
only in the asymptotic SD case are relative errors quantita- 
tively given by the Poisson limit Api/pi ~ l/\/ (fiNd)- This 
is because when mixtures overlap and are in the hybrid/ WD 
regimes, results are driven by the distribution tails, where 
the effective number of data points is still N e g oc Nd. For s 
and jUi, however, MCMC results show better improvements 
in the WD regime than factors of vTO- This might be due 
to reduced parameter degeneracies from the larger number 
of data points. Varying /1 to 0.3 and 0.5 mildly impacts the 
results. As the /1 goes closer to 0.5, constraints improve for 
most parameters, except for /1 and 1T2. Constraints on fi 
are almost unchanged while constraints on 02 are degraded, 
because fewer data points are available in the second com- 
ponent to constrain 02. 

Based on Fig. [5] and [8] we can already anticipate the 
constraints from Astro-H: when there is significant bulk flow 
and the modes have a large relative velocity Ubuik > o"turb, 
parameters can be constrained to ~ 1% accuracy (SD 
regime); when the relative velocity is small but the widths 
are different by a reasonable (a few tens of percents) amount, 
the parameter estimates are accurate at the ~ 10% level 
(WD regime). Given the modeling uncertainties in the phys- 
ical interpretation of these parameter estimates, such accu- 
racy is more than adequate. Next, we will consider two spe- 
cific examples of the SD regime and WD regime respectively. 

4.2 Application to Clusters: the Single Line 
Scenario 

We begin our discussion of mixture modeling of cluster emis- 
sion line spectra with the simplest case. For now we ignore 
line blending and continuum emission, and only consider 
one emission line - the He-like iron line at 6.7 keV. Again, 
we assume the PDF is composed of two Gaussian compo- 
nents. Most of these assumptions will be relaxed later. We 
assume the cluster is isothermal with a temperature of 5 
keV. The assumption of an isothermal distribution is of 
course somewhat crude for the entire cluster. However, for 

8 The turnover does not appear in the last panel of Fig. [5] because 
there the y-axis is A(cr2)/&2 rather than A(<T2) 




Energy (keV) 

Figure 10. The mock spectra (data points) and best-fit models 
for the WD (upper panel) and SD (lower panel) cases in the 
single line scenario. The red solid and dot-dashed lines are the 
input overall spectra and individual components respectively. The 
green dashed lines are the recovered components. The recovery is 
remarkably accurate, even when (as in the top panel) the spectra 
is visually indistinguishable from a single component Gaussian. 



nearby clusters, the emission-weighted spectrum is accumu- 
lated from a small area where temperature variations are 
generally mild (< 0.5 keV). Moreover, our results are not 
very sensitive to the temperature distribution. We express 
our results in terms of the bulk peculiar velocity of the first 
component (v pec ), the relative velocity between the two com- 
ponents (v re i), and the 3D turbulent velocity dispersions of 
each component (w t b,i and v t b,2)- We assume isotropic tur- 
bulence, so the line of sight velocity dispersion is ut^i/v^. 
They are related to the Gaussian PDF via: 

Mi = fo + ^o , (6) 

c 

V re l 

S = V , 

C 

ai ~ \/ a tb,i + a ther + °fnstr> 

Vtb,i 
Ctb.i = VO— 

V3c 

uo I kT 
c y Arrip ' 

where vo is the line frequency in the rest frame, ai nstr is 
the standard deviation of instrumental noise (FWHM/2.35), 
A is the atomic weight of iron and m p is the proton 
mass. In our WD example, we assume (utb.i , «tb,2) = 
(150,300)kms~ 1 , and v Icl = 100 kms" 1 . For the SD exam- 
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Table 3. Input parameters for the WD case and recovered best-fit parameters together with their 1-cr errors. Also shown are the 
predicted uncertainties using the Fisher matrix technique. Note that »pec,"iei are line of sight quantities, while «tbl> , 'tb2 are 3D 
velocity dispersions (assuming t)| D = 3« 2 D ). 





Input 


fl 
0.1 


i; pec (km/s) 



D re ;(km/s) 
100 


^i6,i(km/s) 
150 


«ii,,2(W s ) 

300 


Single line 


MCMC 
Fisher Matrix 


46+ 18 
(0.12) 


11 Q4+H.62 

(12.99) 


89.48±?I;°1 
(14.06) 


164.89l^ 5 T 
(37.03) 


298.82l 1 13 ;« 
(10.75) 


Multiple lines 


MCMC 
Fisher Matrix 


42+ a25 
(0.12) 


-14.24^ 7 79 5 
(11.28) 


130.72±ff;?| 
(17.37) 


145.48if 3 j 9 5 
(36.75) 


291.64ll 4 4 ° 9 
(10.65) 


Multiple lines 
plus continuum 


MCMC 
Fisher Matrix 


64+ ' 22 
(0.19) 


5 44 +lS.10 
(15.36) 


147.88™ 
(27.04) 


180.16l 2 ?;g 
(53.07) 


sos.utllit 

(17.19) 



Table 4. Same as Table [3] but for the SD case. 





Input 


fx 
0.4 


i) pec (km/s) 



i) rei (km/s) 
500 


%,i(W s ) 

150 


fi6,2(km/s) 
300 


Single line 


MCMC 
Fisher Matrix 


40 +a(J1 
"•^-0.02 

(0.01) 


o 1 rr+7.28 

z - 1 '-6.40 

(6.40) 


493.651482 
(4.52)' " 


152.7lt^-j 2 
(11.45) 


310.981533 
(9.77)' " 


Multiple lines 


MCMC 
Fisher Matrix 


41 + ' 01 
(0.01) 


2 47+6.18 
(5.73) 


505.2814^3 
(4.15)' " 


148.71±^ 6 f 
(12.39) 


294.13l*j* 
(9.24) 


Multiple lines 
plus continuum 


MCMC 
Fisher Matrix 


40+ ' 02 
(0.02) 


-o.oit™* 

(6.72) 


486.431"° 
(4.61) 


1 67 02+ 15 ' 33 
(15.75) 


309 75+ 14 ' 04 
(12.50) 



pie, we assume the same Vtb,i, «tb,2, but v rc \ = 500 kms - . 
In all cases, we assume the bulk velocity zero-point v pcc — 
Okms _1 Sloshing in the cluster potential well gener- 
ally results bulk motions wit h transonic Mach numbers 
i|Markevitch fc Vikhlininll2007l ). so such a value is realistic 
for a 5 keV cluster (with sound speed c s ~ 1000 kms -1 ) 
along an arbitrary line of sight — indeed, such velocities are 
found in the simulated cluster in ol2.ll With these assump- 
tions, the widths of the first and second component, in- 
cluding instrumental, thermal and turbulent broadening, are 
3.54 and 4.87 eV; the offsets between peaks are 1.12 and 
11.17 eV for the WD and SD cases, respectively. These pa- 
rameter choices correspond to s/(<ti + 02) = (0.13, 1.3) re- 
spectively, and thus can be compared to expectations from 
Fig. [5] Note that the SD case is not quite in the asymptotic 
regime s/(a"i + a 2) ,>3 yet (where the relative errors would 
be ~ 1/V/iiVd ~ 1%), but it is fairly close. 

The mock spectra and best-fit models for 10 4 photons 
are shown in Fig. 1101 The best-fit parameters and their un- 
certainties are listed in the first row of Table [3] and U In 
accordance with expectations from t)4.1l component recov- 



9 Note that if the redshift of the collisionless component of the 
cluster (which does not participate in gas bulk motions) can 
be determined to high accuracy by spectroscopy of numerous 
galaxies, then v pcc ,v IC \ — v pec give the line of sight bulk veloc- 
ities with respect to the cluster potential well. For instance, for 
nearby clusters where N gBl i ~ 400 galaxy redshifts have been 
measured, the relative error in the center of mass redshift is 
~ 1000km s~ 1 /(v / 3\/^Vgal) ~ 30kms — . Otherwise, only u re j (the 
relative bulk velocity) is of physical significance. 




6.62 6.64 6.66 6.68 6.7 6.72 
Energy (keV) 



Figure 11. Same as Fig. UOI but for the entire iron line complex. 
The continuum has also been included, assuming a metallicity of 
0.3 Zq. 
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ery is remarkably accurate. Even is the WD case, which is 
visually indistinguishable from a single Gaussian (see top 
panel of Fig. I10p , the decomposition into the original mix- 
tures is very good, and most velocities are constrained to 
within ~ 10 — 30kms _1 , which is significantly higher accu- 
racy than needed to model the physical effects of bulk mo- 
tions and turbulence in the cluster. This showcases the great 
potential of high spectral resolution instruments. Of partic- 
ular interest is the constraint on the mixing fraction, which 
is a very good indicator of our ability to separate different 
components. A confident detection of multiple components 
should have /i/A(/i) larger than a few, i.e., the best-fit frac- 
tion should be at least a few a away from "non-detection" 
(/i or /2 equal to 0). In the single line scenario, the 1 — a 
error of /i is 0.01 and 0.12 in the SD and WD cases, respec- 
tively, consistent with our expectations from discussions in 
the previous section. However, a large fraction of the con- 
straints in the WD case is from the tails, and could easily 
be affected by continuum emission (see discussion below). 
Also note the general consistency between Fisher matrix 
and MCMC techniques, indicating the Gaussian shape of 
the likelihood surface for this scenario. 

4.3 The Impact of Multiple Lines and Continuum 

In this section, we consider the impact of multiple lines and 
continuum emission. Iron lines appear as a line complex be- 
tween 6.6 and 6.75 keV, and these lines inevitably blend 
together. Multiple lines have two competing effects. First, 
taking all lines into account-all of which have identical mix- 
ture decompositions-means more photons, which reduces 
shot noise in parameter estimates. The photons from the 
entire line complex is about twice that from the He-like iron 
line alone. Secondly, as different lines blend together, infor- 
mation contained in the shape of individual lines is partly 
lost due to blending in the line wings. The latter are cru- 
cial to driving parameter estimation in the hybrid and WD 
cases (note however, from Fig. [11] that the lowest and high- 
est energy lines in the complex have low/high energy line 
wings respectively which are unaffected by blending. This 
is particularly important in the case of the high energy He- 
like line, which is by far the strongest line in the complex). 
These two factors have opposite effects on the constraints. 
As in the previous sub-section, we run MCMC chains and 
Fisher matrices to estimate the constraints. The properties 
of the line complex were taken from ATOMDB databasJ"! 
(v. 2.0.1). To save computing time, we only included the ten 
strongest lines lines. Fisher matrix estimates including more 
lines show negligible difference. 

The results are listed in the second row of Table [3] and [4] 
In the SD case, the constraints estimated using both MCMC 
and Fisher matrix techniques are very close to those in the 
single line scenario, indicating almost total cancellation be- 
tween the effects just mentioned. In the WD case, the con- 
straints from the Fisher matrix technique are again close to 
the single line scenario. However, the results from MCMC 
runs show asymmetry, and in general, the constraints are 
worse than in the single line scenario. Line blending seems 
to make the likelihood surface significantly non-Gaussian. 

10 http://www.atomdb.org/ 
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Figure 12. Model selection: regions of the /i - v re i plane where 
the double component model is preferred according to the BIC, for 
v th l = 100 kms" 1 , u tbj2 = (200, 300, 400) kms^ 1 , and /i = 0.4, 
Vpcc = Okms -1 . 

Next, we include the effect of continuum emission. Con- 
tinuum acts as a source of background noise. Even though 
we can measure and subtract the continuum, doing so in- 
troduces shot noise, particularly in the line wings when Fe 
line emission and continuum brightness can become com- 
parable, or continuum emission could even dominate. The 
relative level of continuum and Fe line emission is controlled 
by metallicity; larger metallicities imply brighter lines. The 
mean metallicity of clusters is typically Z ~ 0.3 Z©, which 
we shall assume, though the metallicity in the cluster center 
is often higher due to contributions from the cD galaxy. We 
apply our mixture model incorporating both the effects of 
line blending and continuum; the results are shown in Ta- 
ble [3] and [4] and in Fig. [11] (for the purpose of clarity, only 
1/3 of the data points are shown in this figure). The results 
are as one might expect. In the SD case, the constraints are 
only slightly worsened, since the mixtures are clearly sepa- 
rated, and almost all the ~ fiNd points in a given mixture 
can be used for parameter estimates; only a small fraction 
in the line tails are contaminated by line blending and the 
continuum. The constraints in the WD case are more badly 
affected, since the constraints in this case are largely drawn 
from the tails; here, the differences between the MCMC and 
Fisher matrix techniques are also further enlarged. The pres- 
ence of the continuum and line blending limit the domain of 
the WD regime, which is no longer strictly independent of 
s/(<Ti+<T2). For instance, if we assume v IC \ — 50kms _1 (cor- 
responding to s/(cti + (72 ) = 0.067), the MCMC simulations 
fail to converge). They thus limit our ability to constrain 
components with small separations, though in practice such 
small separations should be rare. 

4.4 Model Selection: When is a Mixture Model fit 
Justified? 

Thus far, we have only considered how accurately mixture 
model parameters can be constrained. However, this begs 
the question of whether a mixture model approach is justi- 
fied at all, particularly when (as in the WD case) the ob- 
served emission line is visually indistinguishable from a sin- 
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Figure 13. Model selection: shaded regions shows the regions of 
(t> rc l,t>tb 2) parameter space where the double component model 
is preferred according to BIC, while the hatched regions are where 
mixing fraction is accurately constrained: A(/i) < (0.2,0.1) (cyan 
and purple hatches respectively). All other parameters are as in 
Fig. [TH 



gle Gaussian. Introducing additional parameters will always 
result in an improved fit, even when these parameters are 
largely irrelevant and of little physical significance. This is 
essentially a mo del selection problem. We use information 
criteria (e.g., see iLiddld (2004)) which penalize models with 
more parameters, to identify preferred models. While they 
have solid underpinnings in statistical theory, fortunately, 
they have very simple analytic expressions. In th is paper, 
we us e the Bayesian Information Criterion (BIC; ISchwara 
dUli)): 



BIC\ 



-21n£ max 

+ HniV 



(7) 



where £ max is the maximum likelihood achievable by the 
model, k is the number of free parameters, and N is the 
number of data points; the preferred model is one wh ich min- 
imize s BIC. The BIC comes from the Bayes factor (|jeffrevsl 
which gives the posterior odds of one model against 
another. We use it o ver the c l osely related Akaike Informa- 
tion Criterion (AIC; lAkaikei l|l974h ). which places a lower 
penalty on additional model parameters. Thus, we adopt a 
conservative criterion for preferring mixture models. The ab- 
solute value of the BIC has no significance, only the relative 
value between models. A difference of 2 is regarded as posi- 
tive evidence, and of 6 or m ore as strong evidence, to prefer 
the model with lower BIC I Jeffreys! 1 196ll ; iMukheriee et alj 
1998). Note that the BIC does not incorporate prior infor- 
mation. This is possible w ith the more so phisticated notion 
of Bayesian evidence (e.g.. iMackavl (|2003l )). but involves ex- 
pensive integrals over likelihood space, and is unnecessary 
in our case since we adopt uninformative priors. 

We aim to distinguish the double component model 
with k = 5 (free parameters: (v pe c, «rel, fx, Vtb,i> «tb,2), 
against the single component model with k — 2 (free pa- 
rameters: /J., a). We create simulated data sets which have 
two underlying components, and see which regions of pa- 
rameter space the BIC will correctly prefer the two com- 
ponent model. Our simulated line profiles incorporate the 
additional effects of thermal and instrumental broadening, 



continuum, and line blending. Rather than exploring the 
full 5 dimensional space, we explore the most interesting 
subspace to see where model selection is effective. In Fig. 
1121 we explore model selection in the /1 - v IC \ plane, for 
v t b,2 = (200, 300, 400) km s" 1 , and v pcc = Okms" 1 , /1 = 
0.4, Vtb.i = 100 kms - . The plot shows where the BIC for 
the double component fit is smaller than that for the single 
component fit (note that the BIC is obtained by allowing for 
variation in all fitted parameters; we are just plotting model 
selection in a subspace). When «tb,2 = 400 kms -1 , all val- 
ues of w ro i and all 0.1 < fi < 0.9 permit correct selection of 
the double component model. The result is very similar for 
«tb,2 = 300 kms , but for Vtb,2 = 200 kms -1 , if both fi,v Te \ 
assume low values, the double component model is not pre- 
ferred. Overall, it is reassuring to see that model selection 
is not very sensitive to /1, since we previously restricted 
our studies to /1 = 0.4. Thus, even if a smaller fraction of 
the emission weighted volume has a markedly different ve- 
locity structure, it will be detectable in the spectrum. In 
Fig. 1131 we show the regions of (ti re i> «tb,2) parameter space 
where the double component model is preferred according 
to BIC. Overall, as expected, the mixtures can be distin- 
guished if v rc \ or «tb,2 are large; for Astro-H and with the 
adopted parameters, this is of order 200 kms -1 . In addition, 
we show the regions where the mixing fraction /1 is accu- 
rately constrained to A(/i) < (0.1,0.2), since the error on 
the mixing fraction should be a good indicator of our ability 
to distinguish mixtures. We use the Fisher matrix formalism 
to calculate these constraints. The results are qualitatively 
similar that obtained with the BIC, though somewhat more 
restrictive. 



4.5 Non-Gaussian Mixture Components 

All the preceding discussions are based on the assumption 
that the PDFs of individual components are Gaussian, which 
is not true in general. As we see in Fig. [4] individual mixtures 
show deviations from Gaussianity, i.e. Gaussians are a good 
but imperfect set of basis functions. In principle, this can 
be dealt with by fitting higher order mixture models, but in 
practice the data quality from Astro-H does not allow this; 
parameter estimation becomes unstable and large degenera- 
cies develop, particularly since the higher order mixtures 
generally have low mixing fractions fi. Unless their velocity 
means or widths are very different, the physical interpreta- 
tion of these additional components is also more difficult. 
Here we construct a simple toy model to isolate the effects 
of non-Gaussian components. As there are many flavors of 
non-Gaussianity, the results we show are meant to be illus- 
trative rather than definitive. 

To this end, we extract PDFs from a simulated re- 
laxed cluster, use them as the "basis" PDFs of individual 
components, resize and combine them to produce a com- 
posite PDF, which is in turn used to generate mock spec- 
tra. The "basis" P DFs are extracted from a simulation by 
IVazza et all (|2010l ). which the authors kindly made public; 
we sample different PDFs by looking along different lines 
of sight. The cluster, labeled as E14, has a mass of M ~ 
10 15 Mq and experienced its latest major merger at z > 1. 
Due to shot noise which arises from the finite resolution (25 
kpc/2, -1 ) of this simulation-which results in a small number 
of cells-we are forced to extract the emission weighted veloc- 
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Table 5. Non-Gaussian mixture components: input parameters (obtained by shifting and rcscaling the non-Gaussian mixtures) and 
recovered best-fit parameters together with their 1-<t errors, for both the WD and SD cases, as in Fig. 1141 Note that the Fisher matrix 
results-which require an analytic likelihood-assume Gaussian mixtures, and hence are the same as Tables [3] and [4] 



Vpec (km/s) u re! (km/s) i> ti); i (km/s) u t (, j2 (km/s) 



Input 0.4 100 150 300 

WD MCMC O.nto.fn -33.18±^-j3 112.49±^ 63.80l^ 8 5 ^ 284.99l^ 8 | 2 
Fisher Matrix (0.19) (15.36) (27.04)' (53.07) (17.19) 

Input 0.4 500 150 300 



SD MCMC 



1-0.01 s 17+5-83 ™ 90 +4.49 1fio9s +12.09 0H „ ,-. + 13.81 



Fisher Matrix (0.02) (6.72) (4.61) (15.75) (12.50) 
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Figure 14. The velocity PDFs in the WD (upper panel) and SD 
(lower panel) cases. The solid (red) and dashed (green) curves 
are the input and recovered PDFs, respectively. The thick curves 
are the overall PDFs, while the thin curves show the individual 
components. 



ity PDFs from a large volume of 400 x 400 x 1000 kpc 3 . The 
PDFs are shifted (to match means), linearly rescaled (to 
match variances) and combined to produce the same WD 
and SD cases in § I4.3l"1 We then convolve the composite 
PDFs with thermal broadening and instrumental noise for 
the entire Fe line complex, and add continuum to produce 
mock spectra. Finally, we fit the mock spectra to separate 
and constrain the two components. The results are shown 



11 We emphasize that this procedure is not meant to simulate 
what an realistic observation would see, which we treat in jj5] It 
is a toy model in the spirit of the preceding sub-sections, where we 
use simulations to generate non-Gaussian mixture components. 



in Fig. Q3] and Table [S] In Fig. Q3] the solid (red) curves 
are the input PDFs while the dashed (green) curves are the 
best-fit model. The thick and thin curves are the total PDF 
and individual components, respectively (note that because 
we display the velocity PDF rather than the spectrum, the 
multiple lines in the Fe complex, as well as the continuum 
and thermal/instrumental broadening, are not shown. How- 
ever, all these effects are included in the simulations). In the 
SD case (lower panel), the two components are recovered 
almost perfectly. In the WD case (upper panel), however, 
there are some discrepancies between the input and output 
PDFs. The same conclusion can be drawn from Table [5] in 
the WD case, the best-fit values of /i and Vtb,i are some- 
what different from the input values. However, they are still 
within the (large) errors. Comparing Table [5] with Table [3] 
and 21 we see that at least in this case, non-Gaussian com- 
ponents have limited effect on the results. Note the strong 
discrepancy between MCMC and Fisher matrix error bars 
in both cases, and in particular the strong asymmetry in 
MCMC errors. We repeated the same exercise several times 
with PDFs randomly drawn along different lines of sight 
from the same simulation. In most attempts, we are able to 
recover the input parameter values within the uncertainties. 
Thus, conclusions based on Gaussian components are still 
applicable when the true PDFs deviate from Gaussianity by 
a reasonable amount. Instrumental and thermal broadening, 
which gaussian, effectively smooth out small scale deviations 
from Gaussianity. 



5 RESULTS FROM NUMERICAL 
SIMULATIONS 

5.1 Cold Front Cluster 

Finally, we apply our tool to cluster simulations. In the first 
example, we attempt to recover the velocity PDFs shown in 
Fig. [4] which derives from a cosmological ENZO simulation 
of a cold front cluster. These two cases, which come from 
different lines of sight through the same cluster, correspond 
to s/(<7i + <72) = 1.42 and s/(ai+<72) = 0.42 respectively, i.e. 
in the "separation-driven" and "width-driven" regimes. We 
first fit the PDFs with a mixture model when no sources of 
noise or confusion are present, to derive the "true" parame- 
ter values. We then generate mock spectra by adding ther- 
mal and instrumental broadening, continuum emission and 
line blending to the PDFs, and then apply mixture model- 
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Table 6. "Cold front" cluster: best-fit parameters and their uncertainties for the PDFs in Fig. 1151 obtained using the Enzo simulation 
described in i|2.1l Case 1 and 2 are the top and bottom panels of Fig. 1151 respectively. The "true values" are obtained by fitting the 
PDF directly, while the recovered values are obtained from the mock spectrum of 10 4 photons, which includes line blending, thermal 
and instrumental broadening, and continuum emission. 
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Figure 15. "Cold front" cluster: the solid (red) curves are the 
same velocity PDFs as in Fig. [4] The dashed (green) curves are the 
recovered PDFs from the best-fit models, and the dotted (blue) 
curves are the individual components. Numerical values of the fit 
parameters are in Table [6] 



ing to the results. The results are given in Fig. 1 151 and Table 
[6] Overall, the results are very good. The best-fit models 
successfully recover the general features of the PDFs, and 
accurate parameter estimates with uncertainties which are 
consistent with our estimates from the toy models - on the 
order of ~ 10% for the width-drive case (case 2) and ~ 1% 
for the separation-driven case (case 1). No systematic biases 
appear to be present. As we discussed in i]2.3l these param- 
eters all have physical significance: u t b,; relates to the tur- 
bulent energy density in each component, to the emission 
weighted volume fraction of each component, and v TC i to the 
bulk velocity shear between them. We also applied the single 
component model to the same mock spectra, and compared 
the BIC values. In both cases, the double component model 




le-27 le-26 

Density- 



Figure 16. Density map and velocity field on the y—z plane. The 
size of the figure is 1 Mpc; the bulk motion along the y-direction 
has been substracted out. 

is preferred (case 1: -B/Cdoubio — BIC' s i ng i c = —1002; case 2: 

B/Cdoublc — BlCsinglc = —10). 

5.2 AGN Feedback Cluster 

The second example is a FLASH simulation with static grav- 
ity and radiative cooling of a cluster with an AGN in the 
center (hereafter denoted as "AGN feedback"); a simulation 
snapshot was kindly provided to us by Marcus Briiggen. The 
si mulated cluster , mean t to mimic Hydra A, i s desc ribed 
in iBriiggen et al.l l|2007l ) and ISimionescu et alj (120091 ): nu- 
merous plots of the velocity field can also be found in 
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Table 7. "AGN feedback" cluster: best-fit parameters and their uncertainties for the simulated PDF in Fig. 1171 The "true values" are 
obtained by fitting the PDF directly, while the recovered values are obtained from the mock spectrum of 10 5 photons, which includes 
line blending, thermal broadening, instrument noise, and continuum emission. 



fl D pec (km/s) « rei (km/s) u ti)i i(km/s) u tf , i2 (km/s) 



True values 0.28;to;°i — 5.89±g;|| 2.91^ ^ 44.72±]J| 240.07±?; 
Recovered 0.30±g;gf — 1.58±?-|| — 4.38±|;|| 23.43±5g;|g 246.80±£°, 
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Figure 17. "AGN feedback" cluster: the solid (red) curves are 
the velocity PDFs from the simulation. The dashed (green) curves 
are the recovered PDFs from the best-fit models, and the dotted 
(blue) curves are the individual components. Numerical values of 
fits are in Table [7] 




Figure 18. Error contours for the "AGN feedback" case: con- 
tours depict the 68%, 95% confidence levels for the marginalized 
distribution; the shadings shows the mean likelihood of the sam- 
ples; the solid and dashed curves in the 1-D plots are the fully 
marginalized posterior and relative mean likelihood of the sam- 
ples, respectively. The stars and vertical lines label the positions 
of the true values. 



IVazza. Roediger fc Brueggenl (|2012l ). Here, we briefly sum- 
marize some properties. The box size was 1 Mpc and AMR 
resolution reached a peak of 0.5 kpc in the center, and a 
maximum of (1,4,8) kpc outside (16,100,200) kpc respec- 
tively. A bipolar jet 2 kpc in diameter with power Lj e t = 3 x 
10 45 ergs -1 was then introduced; for the analyzed snapshot 
the bulk velocity along the jet is ~ 1500 — 1800 kms -1 , and 
a M ~ 1.3 shock has been driven into the surrounding ICM. 
The AGN was also given a bulk velocity of ~ 670 km s - 
along the direction of (-1,1,0) relative to the ambient ICM, 
to mimic the observed offset between the shock center and 
the AGN in Hydra A. In Fig |16l we show a 1 Mpc size density 
and velocity field map on the y-z plane through the center. 
The size of the figure is 1 Mpc. The large bulk velocity along 
the x-direction has been subtracted from the figure. The out- 
flows from the AGN stir the gas in the central region (~ 300 
kpc in radius) , while the ambient gas is left relatively quies- 
cenQ The velocity field is predominantly radial outside 100 
kpc (associated with jet expansion and the running shock), 
while it is close to isotropic within 100 kpc, indicating that 
instabilities have efficiently isotropized and distributed the 
jet powerF^I In Fig. 1171 we plot the emission- weighted veloc- 
ity PDF along the z-direction inside an area of 1 x 1 Mpc 2 . 
The division between turbulent and quiescent gas shows up 
in the velocity PDF as a double Gaussian distribution - a 
narrow Gaussian corresponding to the quiescent gas outside 
the core, and a broad Gaussian corresponding to the tur- 
bulent gas in the center. This is an example of non-volume 
filling turbulence discussed in § 12.21 Note that we have pes- 
simistically chosen a viewing direction in which there are 
no bulk motions (similar to the "width driven" case of the 
preceding example). For other viewing angles, the jet expan- 
sion drives bulk motions which result in two clear peaks in 
the spectrum (similar to the preceding "separation driven" 
case) . 

12 The small velocity dispersion (~ 45 kms -1 ) of the quiescent 
region in this example come from the fact that apart from AGN 
outburst, it is a relaxed cluster which has not experienced any 
recent major mergers. Note, however, that the initial conditions 
come from cosmological GADGET SPH simulations where the 
small scale gas motions may not have been fully resolved. 

13 Note, however, as also discussed by 

IVazza. Roediger fc Brueggenl <2012h . that these simulations 
are purely hydrodynamic, and magnetohydrodynamic (MHD) 
effects can strongly affect fluid instabi lities and energy trans- 
fer from AGN bubbles to the ICM llRuszkowski et al.l |2007|; 
iDursi fc Pfromme j [iooj; lO'Neill. DeYoung fc Jones! 120091). For 
instan ce, 3D MHD simulations of bipolar jets bv lO'Neill &: Jonej 
(2010) find to the contrary that jet energy is not efficiently 
distributed/isotropized, remaining instead near the jet/cocoon 
boundary. 
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The scales in this Hydra A example are so large that in 
this particular instance, the velocity structure could be spa- 
tially resolved by Astro-H. However, Hydra A is of course 
an extremely rare and energetic outburst; for more typi- 
cal jet luminosities of Lj e t ~ 10 44 ergs _1 , the turbulently 
stirred region will be at least a factor of ~ 30 1//3 ~ 3 smaller 
or ~ 100 kpc in size, and hence barely resolved by Astro- 
H. In this instance, mixture modeling will still be required 
to uncover the filling fraction of turbulence. Also, as pre- 
viously discussed, MHD simulations show that motions are 
not efficiently isotropized and distributed within the region 
of influence of the AGN, so in reality there could be small 
scale intermittency in turbulence which would be spatially 
unresolved, but detectable with mixture modeling. 

To approximate such situations, we analyze the spec- 
trum with the velocity PDF shown in Fig. 1171 where the ef- 
fects of line blending, thermal broadening, instrument noise, 
and continuum emission have been included. This is a clear 
example of the "width driven" scenario, with <7i/<72 = 0.70. 
We were unable to recover the velocity PDF from the mock 
spectrum with 10 4 photons. The estimated BIC values for 
the single and double component models using the "true val- 
ues" indeed show that the single component model is pre- 
ferred for 10 4 photons (BICdoubic — fl/C 8 ingi e =17). However, 
with 10 5 photons (which is for instance, possible for Perseus; 
see Table [2]), the two components could be easily separated 
(in this case, SJCdoubie — SJC S ingi e =-102). The results are 
given in Table [7] Again, here the "true values" are obtained 
by fitting the PDF directly, by generating a Monte-Carlo 
sample of 10 4 photons. The corresponding 2-D error con- 
tours and marginalized posterior are shown in Fig. 1181 Note 
the firm lower limit of ~ 30% to the quiescent component; 
a clear detection that turbulence is not volume-filling. The 
velocity dispersion and hence the energy density in the tur- 
bulent component are also accurately recovered. 



6 CONCLUSIONS 

Gas motions can have profound influence on many physi- 
cal processes in the ICM, but thus far we have lacked a di- 
rect measurement of turbulence in clusters. Upcoming X-ray 
missions-in particular Astro-H-are poised to change that, 
by directly measuring turbulent broadening of spectral lines. 
Thus far, most work has focussed on how gas motions can al- 
ter the mean and width of X-ray emission lines from galaxy 
clusters. However, the detailed shape of the line profile has 
valuable information beyond these first two moments. Ex- 
ploiting the line shape (and thus the high spectral resolution 
of upcoming missions such as Astro-H) can in many cases 
ameliorate poor angular resolution in inferring the 3D veloc- 
ity field. The main point of this paper is that the line-of sight 
velocity PDF can often be meaningfully decomposed into 
multiple distinct and physically significant components. The 
separation is based on deviations of line profiles from a sin- 
gle Gaussian shape, driven by either the difference in width 
( "width-driven" , WD) or mean ( "separation-driven" , SD) of 
the components. Such a mixture decomposition yields quali- 
tatively different results from a single component fit, and the 
recovered mixture parameters have physical significance. For 
instance, bulk flows and sloshing produce components with 
offset means, while partial volume-filling turbulence from 



AGN or galaxy stirring leads to components with different 
widths. The offset between components allows us to measure 
gas bulk motions and separate them from small-scale turbu- 
lence, while component fractions and widths constrain the 
emission weighted volume and turbulent energy density in 
each component. With the MCMC algorithm and Fisher ma- 
trix techniques, we evaluate the prospects of using Gaussian 
mixture models to separate and constrain different velocity 
modes in galaxy clusters from the 6.7 keV Fe line complex. 
We found that with the 10 4 photons (which is feasible for the 
~ 14 nearest clusters; see Table[2]), the components could be 
constrained with ~ 10% accuracy in WD cases, and ~ 1% 
accuracy in SD cases, in both toy models and simulations 
of clusters with cold fronts and AGN feedback respectively. 
Continuum emission degrades the constraints in WD cases, 
while it has little impact on the SD cases. On the other 
hand, line blending appear to have little impact. We gen- 
erally find that Astro-H is effective in separating different 
components when either the offset between the components 
or the width of one of the components is larger than ~ 200 
km/s. Using PDFs taken from numerical simulations as "ba- 
sis" functions, we find that reasonable deviations from Gaus- 
sianity in the mixture components do not affect our results. 
We also study error scalings and use information criteria to 
determine when a mixture model is preferred. 

Many extensions of this method are possible. For in- 
stance: (i) It would be interesting to compare the separation 
between bulk/turbulent motions obtained from mock X-ray 
spectra by mixture modeling, with algorithms for perform- 
ing this separation f or the full 3D velocity field in nume ri- 
cal simulations (e.g., IVazza. Roediger fc Brueggenl ( |2012h ). 
to see how close the correspondence is. (ii) In this study, 
we have assumed that due to Astro-H's poor spatial resolu- 
tion, only line-of-sight information about the velocity field 
is possible. In principle, it should be possible also to obtain 
information about variation of the velocity field in the plane 
of the sky. For nearby clusters such as Perseus, it should 
be possible to examine the line shape as a function of pro- 
jected radial position to obtain a full 3D reconstruction of 
the velocity fi eld (a more deta i led im plementation of the 
suggestion by IZhuravleva et ail ||2012T ) to study the varia- 
tion of line center and width with projected radial position). 
It would be very interesting to study the variation of mix- 
ture parameters as a function of position in high-resolution 
simulations. Even for more distant clusters, a coarse-grained 
tiling of the cluster should be possible, (iii) High resolution 
X-ray imaging of cold-front clusters yield information about 
density /temperature contact discontinuities in the plane of 
the sky. This has already been used infer the presence of 
sloshing and bulk motions, as well as physical properties of 
the ICM such as viscosity and thermal conductivity. Com- 
bining information about the density/temperature contact 
discontinuity in the plane of the sky with the line of sight 
information obtained by mixture modeling could enhance 
our understanding of gas sloshing in clusters, and give more 
precise constraints on velocities. It would likewise be inter- 
esting to employ mixture modeling on spectra of the violent 
merger clusters with classic bow shocks. 

More generally, mixture modeling of spectra should 
prove useful whenever there are good reasons to be- 
lieve that there are multiple components to the ther- 
mal or velocity field, and/or the line profile shows signif- 
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icant deviations from Gaussianity. For instance, it might 
be fruitful to consider app l icatio n s to the ISM (e.g.. 



iFalgarone. Hilv-Blant fc Petvl (|2004| ;[l azarian fe Pogosvan 



2006)) . or Lyq emission from galax i es (e.g ., lHansen fc Oh 



2006h lDiikstra fc Hultman Kramer! ( |2012h ) 
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APPENDIX A: GIBBS SAMPLING MCMC 
MIXTURE MODELS 

In this paper, we have adopted a "poor man's" approach 
to mixture modeling, in that we have binned the data, and 
dealt with the log-Poisson likelihood of the binned data. In 
principle, binning destroys information; however, in prac- 
tice we have found that given the large number of data 
points, this information loss is negligible. We chose to adopt 
this approach since it is simpler and faster (each Monte 
Carlo simulation now requires operations over ~ 50 bins 
rather than ~ 10 4 data points; hence it is about ~ 200 
times faster). In this Appendix, we describe a more statis- 
tically rigorous way of Bayesian parameter estimation with 
mixt u re models (Roeder fc Wassermanlll997l; iGelman et al.l 
120041 ; iMarin. Mengersen fc Robertl 120051 ; iKellvl 120071 ), and 
show how it compares with this simpler method. We use the 
convention used in statistical literature of "~" to denote "is 
distributed as" or "is drawn from", rather than the usual 
"is to rough order of magnitude" in astronomical literature. 

We will perform MCMC samples of the full poste- 
rior distribution of the mixture model. This requires pri- 
ors to be specified; formally, for Gaussian mixture mod- 
els, an injudicious choice (such as a uniform prior) can 

lead to improper (non-integrable) posterior densities (see 
l ' I i — 

IRoeder fc Wasserman ( 1997) for further discussion of this 
issue). A particularly convenient choice of priors are conju- 
gate priors, in which the posterior distributions p(8\x) are 
in the same family as the prior probability distribution p{9). 
For instance, the Gaussian family is conjugate to itself (or 
self-conjugate) with respect to a Gaussian likelihood func- 
tion: if the likelihood function is Gaussian, choosing a Gaus- 
sian prior over the mean will ensure that the posterior dis- 
tribution is also Gaussian. Fortunately, all members of the 
exponential fami ly have conjugate p riors; for our mixture 
model, they are IjGelman et al.ll2004n : 



(wi, 



2 



Dir(ai, . . . ,a k ) 

Af(p.j,Tj) 

Inv — Gamma(flj, /3j). 



(Al) 
(A2) 
(A3) 



These prior distributions in turn require further 
parameters-known as hyper-parameters-to be specified. In 
the absence of prior information, techniques exist to allow 
these hyper-parameters to become additional parameters in 
the statistical fit, and thus be d etermined by the data it- 
self l|Roeder fc Wasserman! ll997T l. However, given that we 
do have guidance from both theory and other observations 
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(e.g. X-ray imaging; morphology; temperature and density 
profiles) about the expect thermal and turbulent broaden- 
ing of the spectrum, it is reasonable to adopt fixed priors. 
Here we adopt uninformative priors, but describe the choice 
of priors in some detail in case more informative priors are 
desired. 

• Mixture weights ujj. The prior for the mixture 
weights, (tJi, . . . ,cjfc) ~ Dir(ai, . . . , ah) is a Dirichlet dis- 
tribution, which is conjugate to the multinomial distribu- 
tion, the following sense: given a prior Dir(ai, . . . , at), then 
if (61, ... , bk) is the number of occurrences of each event i in 
a sample of n events, the posterior is Dir(a+ b). If there is 
no prior information to favor one component over another, 
then it is common practice to set all members of the prior 
a,i to a common value a, known as the concentration pa- 
rameter; n <g 1 favors concentration amongst a few compo- 
nents, whilst a> 1 favors almost equal dispersion amongst 
all components. Here, we adopt a Dirichlet(l, . . . , 1) prior, 
which is equivalent to a uniform prior on uj, under the con- 
straint that Ylj=i u i = 1- 

• Line centers fij . We adopt a normal prior for the line 
centers, pj ~ Af(flj,Tj), where jlj are the known line centers 
in the absence of turbulent motions. We set a very weak 
prior flj — [i, where pL is the mean energy of all photons, 
and Tj = (c s /(\/3c)i5 , where c a /y/3 corresponds to typical 
line of sight velocities of transonic bulk motions, and E is 
the line center energy. 

• Line widths of. The prior is the inverse Gamma distri- 
bution, of ~ Inv — Gamma(«j, the inverse Gamma dis- 
tribution has the desirable property that it is bounded below 
at zero, and is defined such that 1/of obeys a Gamma dis- 
tribution. Two constraints are needed to determine the two 
parameters of the distribution, (ctj,f3j). Since the inverse- 
Gamma distribution is highly asymmetric, with a long tail 
to high values of of, we do not use estimates for the first two 
moments of the distribution to set (aj,/3j). Instead, we set 
the mode of the distribution to o-^ odc = of ow = /3i/(ai + 1), 
where af ow = of hcrm -|-af natrum is the line width in the absence 
of turbulence; i.e., due to thermal and instrumental broaden- 
ing alone. The asymmetric nature of the Inv-Gamma distri- 
bution generally implies that P(a 2 < o^ odc ) ~ 0.1 — 0.2. 
We require that P(a 2 > cr^ igh ) = 0.1, where ofj igh = 

°fow + "f.niaxJ and v f\,max is sucn that ^turb ~ 0.5[/ t her m 

for isotropic turbulence. Together, these two requirements 
allow a, P to be determined, and result in ~ 70 — 80% of 
samples drawn from the prior to lie between af ovr and oj| igh . 
This fairly loose prior allows a 2 to be mostly driven by the 
data. A similar procedure can be used for more informative 
priors. 

Our goal is to calculate the posterior of the model pa- 
rameters, given the prior distribution and the data, and draw 
samples from this posterior in a numerically tractable man- 
ner. Equation fT} in is the probability of drawing a data 
point with value x, given the model parameters 9. Since 
data points are independent, the likelihood of a data set is 
the product of the probabilities for each data point: 

C{x\9) = \[f{x l \9). (A4) 

i 

The posterior probability of the model parameters 9, given 
the data x and priors for the model parameters p(9\9), where 



9 are the known hyper-parameters, is then simply: 

P{9\x,9) oc C{x\9)p{9\9). (A5) 

In numerical work, we will usually work with the logarithm 
of the posterior, rather than the posterior itself, to avoid 
problems with underflow and rounding to zero. However, 
from equation fT}, evaluating the posterior involves the sum 
of Gaussians, each of which may be so small as to under- 
flow when reconstructed from their logarithms . One must 
theref ore use the log-sum-exp formula (e.g., see lPress et al.l 
<HQ03)): 

log CXp(2i)^ = Zmax + log exp(jZ, - JZ ma x)^ , 

(A6) 

where Zi are the logarithms of small quantities and z max is 
their maximum. This guarantees that at least one exponen- 
tiation won't underflow, and any that do could have been 
neglected anyhow. 

We now draw Markov-Chain Monte Carlo (MCMC) 
samples from the posterior distribution. The Metropolis- 
Hastings algorithm for drawing MCMC samples is prob- 
ably the one most familiar to readers; for instance, it is 
the heart of the wide ly used software package COSMOMC 
i Lewis fc Bridlell2002h . which we also used previously. Here 
we use instead Gibbs sampling, which is the most com- 
monly used approach in Bayesian mixture estimate (e.g., 
iMarin. Mengersen fc Robert! (|2005l ). and references therein); 
it is in fa ct a special case of the Metropolis-H astings 
algorithm (|Gilks. Richardson fc Sp icgclhalter] 1 19961 ). even 
though historically it was developed separately. In Gibbs 
sampling, one draws each parameter 9i from its full condi- 
tional distribution, which is obtained by holding all com- 
ponents of 9 constant except for 9i, and sampling from the 
posterior as a function of 9i alone. It can be shown that 
(unlike the Metropolis-Hastings algorithm) the acceptance 
probability is one in this case, i.e., we can always accept 
a sample drawn from the conditional distribution P(9i\9~) 
(where 9~ denotes "values of all parameters except one"). 
To draw a sample, one simply cycles through each compo- 
nent of 9 in turn. Note that, unlike the Metropolis-Hastings 
method, each component of 9 gets reset to a value com- 
pletely independent of its previous value (although 9~ does 
depend on previous values; thus, we still have a Markov 
chain). Thus, in principle larger steps in parameter space 
are possible than with MH samplin J^l since with the latter, 
large multi-variate steps will have low acceptance probabili- 
ties and almost certainly be rejected. Despite these obvious 
advantages, a major disadvantage of Gibbs sampling is that 
it requires the full conditional distribution to be correctly 
normalized (unlike MH sampling, where an unnormalized 
posterior is sufficient), and thus the numerically expensive 
computation of normalizing constants for each 9~ (as well as 
a practical means of drawing 9i from each conditional dis- 
tribution). Fortunately, for Gaussian mixture models, the 
use of data augmentation and conjugate priors allows for 



However, despite this, Gibbs sampling does not always en- 
joy good convergence properti es, and one has to be wary of get- 
ting trapped in local minima llGilks. R ichardson & Spicgc lhalterl 
1X9961) . 
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analytic conditional distributions, for which there are well- 
known random number generators. 

We adopt a data augmentation approach to the "miss- 
ing data" problem previously mentioned: the particular mix- 
ture to which a data point belongs is unknown, obviating 
a straightforward calculation of model parameters. We can 
treat this unknown membership by assigning to each ob- 
servation Xi a new variable Zj £ 1,2, ... ,k which indicates 
which mixture the data point belongs to. Given the aug- 
mented membership variable Zi, equation {TJ then reduces 
to: 



f(xi\zi =j,6) ~ ./>(>•. /'.;•",) 



(A7) 



where fj(x\fij,aj) is the jth normal mixture component. Of 
course, we do not know the z%, but we know their proba- 
bility distribution: since the probability that a data point 
Xi belongs to a mixture j is pj oc Wj fj(xi\fj,j , erf) (normal- 
ized such that Y2jPi = 1)> the Zi belong to a multinomial 
distribution Zi ~ Mj(l;pi, . . . ,Pk)- 

With this membership variable Zi and the conjugate pri- 
ors in equations (|A1|) . (|A2|) . (|A3|) . we can compute the con- 
ditional distributions, which all take the functional form of 
their conjugate priors. There are well-established algorithms 
for drawing random numbers from all of these analytic dis- 
tributions. The mixing fractions uj are Dirichlet distributed: 



Dir(ai + ni, . . . ,a k + n k ) 



(A8) 



where rij are the number of data points in mixture j, ob- 
tained by summing over the indicator variables z%. We can 
immediately see that the hyper-parameters we have chosen 
a%,...,ak = 1 will have little influence for a large sample. 
The means fj,j are Gaussian distributed: 



I'j 



7? H 



V 



+ 



(A9) 



where the sum runs over all data points x% which belong to 
mixture j, as indicated by the membership variable Zi — j. 
The variances are drawn from an inverse-Gamma distribu- 
tion: 



i] ~ Inv - Gamma ay + + 



(A10) 

Finally, at every iteration we have to draw a new set of 
indicator variables Zi from a multinomial distribution: 



■M(l;pi 



,Pk) 



(All) 



&ndpj(xi) oc Wjfj(xi\iXj,(7j) (normalized such that X^jPi = 
1). A set of each of these random draws then comprises a 
single Monte-Carlo sample from the posterior distribution. 
In our calculations, we find that computing time is heavily 
dominated by random draws from the multinomial random 
number generator. We therefore parallelized the multinomial 
random number generator, and found that this allowed a 
significant (close to linear) reduction in computation timJ^l 



As before, we deal with the "label-switching" problem 
by demanding that /ii < ^2- Note that this is still an area of 
active research (e.g., see lJasraT Holmes fc Stephens! l|2005l )1: 
as yet there is no consensus solution in the statistical com- 
munity. Alternatively, we find that if we enforce asymmetric 
priors, this is usually sufficient to break the symmetry in the 
modes of the posterior and single out a single one. 

We have experimented with running this Gibbs sam- 
pling MCMC and the Cash-C Metropolis-Hastings MCMC 
described in the text, and found that they give similar re- 
sults. For instance, it gives almost exactly the same results 
for the single line SD case given in Table 2] and similar re- 
sults for the WD case given in Table [3] Interestingly, often 
the trace plots of the Gibbs sampling MCMC method for 
the WD case show that it has somewhat poorer mixing and 
convergence properties — indicating that it is more badly af- 
fected by parameter degeneracies. The greater robustness 
of the method we used in this paper stems from the use 
of covariance matrix inform ation to improve the proposal 
distribution in COSMOMC |Lewis fc Bridld l2002h ; a sim- 
ilar means of orthogonalizing parameters should be imple- 
mented here to ensure robustness. Overall, there is no reason 
to use full mixture modeling for the purposes of this paper. 
However, the algorithm described in this Appendix should 
be preferred when there are fewer data points, and binning 
is deprecated. 



15 An alternative would be to run multiple MCMC streams in 
parallel. However, due to the long burn in period for certain re- 
gions of parameter space, this strategy is less efficient. 
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